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The problem of filtering information from large correlation matrices 
is of great importance in many applications. We have recently proposed 
the use of the Kullback-Leibler distance to measure the performance of 
filtering algorithms in recovering the underlying correlation matrix when 
the variables are described by a multivariate Gaussian distribution. Here we 
use the Kullback-Leibler distance to investigate the performance of filtering 
methods based on Random Matrix Theory and on the shrinkage technique. 
We also present some results on the application of the Kullback-Leibler 
distance to multivariate data which are non Gaussian distributed. 

PACS numbers: 02.50.Sk, 05.45.Tp, 05.40.Ca, 02.10.Yn, 89.65. Gh 

1. Introduction 

In many applications the monitoring of the dynamics of the system pro- 
vides multivariate time series and often the number of monitored variables 
is very high. Examples include gene expression level measurement in mi- 
croarray experiments (jl|), fMRI experiments (0), analysis of economic or 
financial data such as firm growth rates or stock price returns ([1; 0; H). 
A common way to investigate the interaction between the variables of the 
system is through the cross correlation matrix. As any statistical estima- 
tor, the sample correlation matrix is unavoidably affected by the statistical 
uncertainty due to the finite size of the sample. This problem becomes ex- 
tremely important when the number of investigated variables is comparable 
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with the number of records of each variable. To cope with the problem 
of the statistical uncertainty of the sample correlation matrix one needs to 
introduce filtering methods able to remove from the correlation matrix at 
least part of the noise. Many techniques have been proposed in the litera- 
ture in order to filter out information from the correlation matrix. However, 
unless one knows in advance the model describing the system dynamics, it 
is difficult to asses the goodness of the filtering procedures. Recently (|6|) we 
have proposed the use of the Kullback-Leibler (KL) distance as a method of 
assessing the performance of correlation matrix filtering procedures. There 
are several reasons why we believe KL distance is a good performance esti- 
mator. The main reason is that we proved (6) that for Gaussian distributed 
variables the expected values of the KL distance are independent from the 
underlying model. This fact allowed us to devise a method to asses the per- 
formance of the filtering method in recovering the underlying model without 
having any knowledge on the model itself. 

In this paper we consider filtering procedures based on Random Matrix 
Theory (RMT), hierarchical clustering and shrinkage and we use the KL 
distance to evaluate their performance. We consider both artificial and real 
data samples. Finally we present an extension of the KL distance to an 
important class on non-Gaussian distribution, specifically the multivariate 
Student's t-distribution. 

2. Kullback-Leibler distance for Gaussian variables 

The KL distance (see for instance (0; 0)) or mutual entropy is a mea- 
sure of the distance between two probability densities, say p and q, which 
is defined as K(p, q) — Ep [log (p/q)], where Ep[.] indicates the expectation 
value with respect to the probability density p. The KL distance is asym- 
metric since the expectation value is evaluated according to the distribution 
p. 

Here we consider the KL distance between multivariate probability dis- 
tributions and we indicate with n the dimension of the space spanned by 
the variables. Let us consider first the case of multivariate Gaussian vari- 
ables. Without loss of generality we assume that the variables have zero 
mean and unit variance. In this case, the Gaussian multivariate probability 
density function P(I], X) is completely defined by the correlation matrix SI 
of the system. Given two different probability density functions P(I]i,X) 
and P(I]2,X), we have 
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where indicates the determinant of SI. From now on we indicate 
i^(P(Ei, X), P(E2, X)) simply with K(Ei, E2). 

Consider the Pearson sample correlation matrix C obtained from the 
observation of the n variables each for T records. The sample correlation 
matrix is different from the true correlation matrix of the system. The 
Pearson estimator of the correlation matrix has the advantage that sample 
covariance matrices of finite variance variables belong to the ensemble of 
Wishart random matrices and many statistical properties of Wishart ma- 
trices are known Since different realizations of the process give rise to 
different sample correlation matrices, a KL distance having one or two sam- 
ple correlation matrices as arguments is a function of one or two random 
matrices. We investigated the statistical properties of KL distance involv- 
ing sample correlation matrices of multivariate Gaussian random variables 
in Ref. B). 

Let Ci and C2 be two sample correlation matrices obtained from two 
independent realizations of the system both of length T. By making use of 
the theory of Wishart matrices ([§) we obtain ([q) that 



E[i^(S,Ci)] 

E[K{Ci,T.)] 
E[K{Ci,C2), 




2T -n-l 



where T{x) is the usual Gamma function and r'(x) is the derivative of r(x). 
We also obtained (6) the asymptotic expectation value of the standard de- 
viation of K{C\,Ti) by using the Bartlett statistics (P). Specifically if 
T 1, n » 1 and Q = T/n ^ I we infer that the standard deviation of 
K{Ci,^) is (Ji^ ~l/(2g). 

The most important property of the expectation values given in Eq.s 
([2l|4]) is that they are independent of E, i.e. they are independent of the 
specific true correlation matrix. This fact implies that (i) the KL distance is 
a good measure of the statistical uncertainty of correlation matrix which is 
due to the finite length of data series and (ii) the expected value of the KL 
distance is known also when the underlying model hypothesized to describe 
the system is unknown. 
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3. Comparison of filtering procedures 

The KL distance can be used to quantify and compare the performance 
of different filtering procedures of correlation matrices (g*). A good filtering 
procedure should have two important properties: (i) being able to remove 
the "right" amount of noise from the data in order to recover the signal 
and (ii) produce filtered matrices which are stable when one makes differ- 
ent observations of the same system. These two requirements are often in 
competition one with the other. In real cases one does not know the true 
correlation matrix, therefore it seems impossible to know whether a filter- 
ing procedure is removing the right amount of noise. However the above 
mentioned property of the expected value of the KL distance of being in- 
dependent from the model correlation matrix can be used to estimate the 
goodness of the filtering procedure. The proposed procedure to evaluate the 
performance of a filtering procedure is the following. 

Suppose we are given with a data sample X and we have our favorite 
filtering procedure. We propose to generate M bootstrap replicas {i — 
1, ..,M) of the data. We then compute the sample correlation matrix 
and apply the filtering procedure obtaining the filtered matrix C^^^^ to each 
replica X^. In order to measure the stability of the filtering procedure, we 
consider the average of over the replicas of the quantity K{C{^^^^ Cj^^^). An 

optimal filtering procedure should be perfectly stable (i.e. (K(Cf^^^ Cj^^^)) = 
0) because from each realization the filtering recovers the model matrix. 
In order to measure the filtered information we consider the average of 
K{Ci,C{^^^) over the replicas. This quantity measures the information 
present in the sample correlation matrix that has been discarded by 
the filtering procedure. We have seen above that for Gaussian variables 
the KL distance (K(C^,I])) is different from zero and independent from 
the model SI (see Eq. Therefore if our filtering procedure is recovering 
the true underlying model we should expect that K(C^,C{^^^) is equal to 
the right hand side of Eq. [3l We have thus an optimal value for both the 
stability and the information expected from an optimal filtering and these 
values are independent from the underlying model. We will represent the 
result of the analysis with a plane where the x axis is related to the stability 
(i^(Cf^',Cf')) and the y axis is related to the information (K(C^, Cj^^^)). 
In this plane the optimal point, labeled has coordinate x = and y equal 
to the right hand side of Eq. [3l A filtering procedure will be considered 
good if the corresponding point in the stability-information plane is close 
to E. 

There are many different filtering procedures. A widespread procedure is 
based on random matrix theory (11). If the n variables are independent and 
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with finite variance then in the hmit T, n ^ oc, with a fixed ratio Q —>1^ 
the eigenvalues of the Pearson sample corr elatio n matrix C is bounded from 
above by the value \max — cr^(l+l/(5+2yl/Q) where — 1 for correlation 
matrices. In some practical cases, such as for example in finance, one finds 
that the largest eigenvalue Ai of the empirical correlation matrix is definitely 
inconsistent with RMT. In these cases, the null hypothesis is modified so 
that correlations can be explained in terms of a one factor model and — 
1 — Ai/n (3). The filtering procedure considered here works as follows fill ). 
One diagonalizes the correlation matrix and replaces the all eigenvalues 
smaller than Xmax in the diagonal matrix with their average value. Then one 
retransforms the modified diagonal matrix in the standard basis obtaining 
a matrix ^rmt of elements hf-^^ . Finally, the filtered correlation matrix 

C^^^ is the matrix of elements cff^ = hff^/Jh^,^^ hf^^ . 

In this paper we also consider hierarchical clustering based filtering pro- 
cedures (Il3l ). Hierarchical clustering methods allow to hierarchically orga- 
nize the elements in a rooted tree or dendrogram. The whole information 
about the rooted tree can be stored in a n x n matrix that can be consid- 
ered as the output of the filtering procedure (13). In a recent paper we have 
shown that this filtered matrix is a proper correlation matrix at least when 
all of its elements are non negative numbers (•13). Here we consider two 
widespread hierarchical clustering techniques, specifically the Single Link- 
age Cluster Analysis (SLCA) and the Average Linkage Cluster Analysis 
(ALCA) (T^). For more details about these techniques see Refs rt6|; Il4l ). 



Finally, we also consider a shrinkage filtering procedure ([15|; Il6l ) in which 
we construct a filtered matrix as 

C^^^(a) = aT + (l-a)C, (5) 

where < a < 1 and T is a target matrix. As commonly done in financial 
literature, we choose the target matrix as a matrix with ta — 1 and tij — 
{cij) for i j. We estimate the performance of the shrinkage procedure for 
different values of a. It is also interesting to note that there exist analytical 
methods to obtain the optimal value a* according to a cost function based 
on standard quadratic (or Frobenius) norm ((itI ). In the figures we also show 
the point (labeled C'^^^(a*)) corresponding to the value a*. 



In fig. [T] we show the KL distance in the plane stability-information for 
these filtering procedures applied to artificial data generated according to 
two different models. The left panel shows the result for a block diagonal 
model with 12 blocks, whereas the right panel shows the result for a hierar- 
chical model. This is a Hierarchically Nested Factor Model (HNFM) with 23 



LiLLO PRINTED ON FEBRUARY 2, 2008 



increasing stability 



increasing stability 



16 
14 
A 12 
a _10 

u 

\^ 6 
V 



X 










C^™(a) 




o C^™ (a ) 




, ^SLCA 


■ • 


^ ^ALCA 




► z 







2 3 4 5 6 7 




4 5 6 



Fig. 1: Stability of the filtered matrix {x axis) against the amount of information 
about the correlation matrix that is retained in the filtered matrix {y axis) . 
The points labeled with C^^^(A) correspond to the filtering procedure 
keeping a fixed number of eigenvalues. The number of kept eigenvalues 
increases when one goes from the top left to the bottom right corner. The 
points labeled with C^^^{a) correspond to the shrinkage procedure (see 
Eq. [5]) and the parameter a goes from to 1 when one goes from the 
bottom right to the top left corner. Left panel shows the result for a block 
diagonal model of n = 100 elements divided in 12 groups and simulated for 
T = 748 points. Right panel shows the result for a hierarchically nested 
model of 100 elements following the HNFM with 23 factors of Ref. (ET 



factors and it has been introduced in Ref. (fl3 ). In both panels we show the 
points corresponding to the RMT, SLCA, and ALCA filtering procedures. 
We also show the points corresponding to filtering procedures in which an 
a priori fixed number of eigenvalues is retained and the remaining ones are 
set equal to their average. We also show the points corresponding to the 
shrinkage filtering procedure of Eq. [5] for different values of a. As expected 
when one includes more and more eigenvalues in the filtering procedure the 
amount of discarded information decreases and the filtered matrix becomes 
less and less stable. Interestingly in the block diagonal model a clear kink is 
observed close to the point corresponding to the filtering procedure where 
12 eigenvalues are included. The point corresponding to the kink is also 
the closest to the optimal point and close to the point corresponding to the 
RMT filtering procedure outlined above. This result shows that for simple 
block diagonal models RMT and KL procedures gives consistent results. 
For the hierarchical model we observe no kink when one varies the num- 
ber of eigenvalues retained by the filtering procedure. This fact indicates 
that filtering procedures based on spectral analysis may have problems in 
filtering correlation matrices with a complex structure. Moreover, the num- 
ber of eigenvalues retained by the RMT filtering procedure is not equal to 
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Fig. 2: Stability of the filtered matrix {x axis) against the amount of information 
about the correlation matrix that is retained in the filtered matrix {y axis) 
for n = 100 stocks of the NYSE in the period 2001-2003 (T = 748). The 
points labeled with C^^-^(A) correspond to the filtering procedure keeping 
a fixed number of eigenvalues. The number of kept eigenvalues increases 
when one goes from the top left to the bottom right corner. The points 
labeled with C^^^{a) correspond to the shrinkage procedure (see Eq. [5]) 
and the parameter a goes from to 1 when one goes from the bottom right 

to the top left corner. 



the number of factors of the HNFM. In the case of the hierarchical model 
the structure of eigenvalues and eigenvectors is definitely more complicated 
than the one observed for a block diagonal model. Such a structure is better 
recovered in the filtering by hierarchical clustering techniques according to 
the right panel of fig. [H Finally, the shrinkage method is capable to achieve 
a very good compromise between stability and information. From this anal- 
ysis it is possible to extract an optimal value of a minimizing the distance 
from the point labeled with S. It should be noted that this value in general 
does not coincide with the value a* obtained with the standard method by 
minimizing the Frobenius norm (17). 

We now consider an application to a real system. We investigate the 
daily returns of n = 100 highly capitalized stocks traded at the NYSE in 
the period 2001-2003 (T = 748). In Fig. [2] we show the performance of 
different filtering procedures in the plane stability-information. First of all 
it is worth noting that no kink is observed when one varies the number 
of eigenvalues retained. This indicates that the block diagonal matrix is 
far from being a faithful representation of financial correlation matrices. 
RMT, SLCA and ALCA have different properties in terms of stability and 



8 



LiLLO PRINTED ON FEBRUARY 2, 2008 



information (&). SLCA is the most stable even if it is the least informative, 
whereas RMT is the least stable but the most informative. ALCA has 
intermediate properties both with respect to stability and to information. 
As for the models the shrinkage seems to outperform the other filtering 
techniques, even if in this case a quantitative prediction of the optimal 
value of a is more difficult due to the non-Gaussianity of financial returns. 
This point will be discussed in the next section. 



The results obtained so far are valid for multivariate Gaussian variables. 
However in many real systems the random variables of interest are non- 
Gaussian, and have often the property that the tails of the distribution 
are significantly fatter than in the Gaussian case. A paradigmatic example 
is financial price return discussed above. In this section we present some 
numerical results obtained for a specific class of non-Gaussian variables. A 
non-Gaussian multivariate distribution useful in describing financial returns 
is the multivariate Student's t-distribution (18). 

The multivariate distribution is 



The parameter /x describes the tail behavior of the marginal distribution of 
any Xi since P{xi) ^ x^^~^. A process distributed as Eq. [6]can be obtained 
by setting Xi(t) = a(t)r]i(t), where the r]s are multivariate Gaussian variables 
with correlation matrix SI and a{t) is a suitably distributed random variable. 

In order to check whether the results on the KL distance for Gaussian 
distributions described above also hold for Student's t-distributions we have 
generated samples of T records of a multivariate Student's t-distribution of 
n = 100 variables. The correlation matrix of the underlying Gaussian vari- 
ables is the HNFM described in Ref. (jl4l ) and this model is the same as the 
one used in Ref. (^) and in the right panel of Fig. [H We made this choice 
in order to have a correlation matrix with a non trivial structure. By using 
Eq. [T]we then compute the KL distance between the model correlation ma- 
trix and a sample correlation matrix obtained with the Pearson estimator. 
We compare this value with the expected value of K{T,, Ci) of Eq. [2] and 
we find that these values are significantly different. Specifically, the value of 
Eq. [U obtained by using the sample Pearson correlation matrix, is larger 
than the expected value given by Eq. [2] (see Fig. [3]). At first sight this seems 
to indicate that the results on the KL distance for Gaussian distributions 
cannot be applied to non-Gaussian variables. However it is known ()l8l ) that 



4. A first extension to non-Gaussian variables 
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Fig. 3: KL distance of Eq.[T]between the model correlation matrix I] and two esti- 
mators of the sample correlation matrix, specifically the Pearson estimator 
C and the maximum likelihood estimator C of Eq. [3 The data are gen- 
erated according to the multivariate Student's t-distribution of Eq. [6]with 
/i = 4. The correlation matrix of the model is the one of a hierarchically 
nested model of 100 elements following the HNFM with 23 factors obtained 

in Ref. 



the Pearson estimator of the correlation matrix is not the maximum like- 
lihood estimator when the variables are non-Gaussian. In the case of the 
Student's t-distribution of Eq. [6] there exists a recursive equation for the 
maximum likelihood estimator C which is (fisl ) 

- ^ n + /i Xi{t)Xj{t) 

Fig. [3] compares the KL distance of Eq. [T] between the model correlation 
matrix SI and the two estimators, specifically the Pearson estimator C and 
the maximum likelihood estimator C of Eq. [71 The figure shows that, while 
K{Ti,C) is not described by Eq. [H the KL distance K{Ti,C) using the 
maximum likelihood estimator C is well described by Eq. [H This result 
suggests that in some cases one can extend the results obtained for Gaussian 
variables to non-Gaussian variables provided that the maximum likelihood 
estimator and not the Pearson estimator is used in the computation of the 
KL distance. An analytical extension of the KL distance to non-Gaussian 
distributions is presented in Ref. (fiol) of this issue. One of the obtained 
results confirms the conclusion drawn in this section about the Maximum 
Likelihood Estimator of Student correlation matrices. 
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5. Conclusions 

We have considered the apphcation of KL distance to the measurement 
of the performance of correlation matrix filtering procedures in giving reli- 
able and stable estimates of the underlying correlation matrix. Our analysis 
suggests that the optimal number of eigenvalues to be retained in filtering 
correlation matrices by mean of spectral procedures is close to the number 
of eigenvalues indicated by RMT. Our investigation of models also indi- 
cates that spectral filtering procedures are slightly more efficient in filtering 
"separable" systems, like those described by block diagonal models, than 
hierarchical clustering filtering procedures, whereas the latter work better 
for systems with a clear hierarchical structure of correlations. We have also 
shown that the shrinkage approach is very efficient in filtering a sample cor- 
relation matrix, although the estimate of the optimal shrinkage intensity in 
terms of the Frobenius norm is far from being optimal in terms of the KL 
distance. Finally, we have suggested a possible extension of our method to 
non- Gaussian variables. 
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